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Abstract. We study the geometric structure of the statistical models for 
two-by-two contingency tables. One or two odds ratios are fixed and the 
corresponding models are shown to be a portion of a ruled quadratic surface 
or a segment. Some pointers to the general case of two-way contingency tables 
are also given and an application to case-control studies is presented. 



1. Introduction 

A two-way contingency table gives the joint distribution of two random variables 
with a finite number of outcomes. If we denote by {0, 1} and {0, . . . , J — 1} 
the outcomes of X\ and Xi respectively, the contingency table is represented by a 
matrix P = (j>ij), where Pij is the probability that X\ = i and X2 = j. The table 
P is also called an / x J contingency table, in order to emphasize that the variable 
X\ has / outcomes and the variable X2 has J outcomes. 

In the analysis of contingency tables odds ratios, or cross-product ratios, are 
major parameters, and their use in the study of 2 x 2 tables goes back to the 
1970's. For an explicit discussion on this approach see, e.g., |Fie80| . 

For a 2 x 2 table of the form: 



(1) 



Poo P01 

yPlO Pll 

there is only one cross-product ratio, namely: 

P00P11 
r = . 

PoiPw 

In the general I x J case, there is one cross-product ratio for each 2x2 submatrix 
of the table. Thus, they have the form 

PijPkh 

PihPkj 

for0<i<fc</-l and < j < h < J - 1 , sec Agr02 , Chapter 2] . In this paper 
we will consider the cross-product ratio and other ratios naturally defined. 

Odds ratios are used in a wide range of applications, and in particular in case- 
control studies in pharmaceutical and medical research. Following the theory of 
log-linear models, the statistical inference for the odds ratios is made under asymp- 
totic normality, see for example BFH75 . More recently, some methods for exact 
inference have been introduced, see |Agr02| and |Agr01| for details and further 
references. For the theory about the Bayesian approach, see |Lin64| . 

From the point of view of Probability and Mathematical Statistics, different 
descriptions of the geometry of the statistical models for contingency tables are 
presented in |Col80l Chapter 2], and in BFH751 Section 2.7], using vector space 
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theory. An earlier approach to the geometry of contingency tables with fixed cross- 
product ratio can be found in |FG70j . In the last few years, the introduction of 
techniques from Commutative Algebra gave a new flavor to the geometrical repre- 
sentation of statistical models, as shown in, e.g., (PRWOlal Chapter 6], |PRW01b| . 
|(mKMm| and fSla04 | . 

In this paper we use Algebraic and Geometric techniques in order to describe the 
structure of some models for two-way contingency tables described through odds 
ratios. 

We first consider the case of 2 x 2 contingency tables of the form Q with the 
constraints pij > for all i, j = 0, 1 and poo + Poi + Pia +Pn = 1- If we allow some 
probabilities to be zero, notice that the ratios are either zero or undefined. Thus 
we restrict the analysis to the strictly positive case. 

In a 2 x 2 table we consider the three odds ratios: 

PooPn 

r x = , 

PoiPio 

PaoPw 

r\\ = , 

PoiPu 

PooPoi 
r = = . 

PwPn 

The meaning of the three odds ratios above will be fully explained in Section 

Let r x = a 2 , r\\ = 1 and r— = 7 2 . For further use, it is useful to make explicit 
the following identities. Considering r= and ru, it is easy to check that: 

(2) /? 7 = ^, 

P\\ 

and 

(3) £ = £2i. 

7 Pio 

In Section [21 we study the geometric properties of some statistical models for 
2x2 contingency tables. We consider models obtained by fixing two odds ratios, 
showing that the model is represented by a segment in the probability simplex and 
studying the behavior of the third ratio. In particular, an expression for tables with 
three fixed ratios is derived. We also recover classical results about models with a 
fixed odds ratio. In Section 03 we give a glimpse of the general situation of / x J 
contingency tables. We focus our attention on 2 x 3 tables and we present some of 
the difficulties arising in the general case. An application to case-control studies is 
presented in Section 0] 

2. Odds Ratios 

In this section, we use basic geometric techniques to study the 2x2 tables having 
two out of the three ratios r x , r— and rn fixed. 

We consider a 2 x 2 matrix as a point in the real affine 4-space A 4 . In particular, 
with the notation of Equation £[J, the p^'s are coordinates in A 4 . A 2 x 2 table is 
a matrix in the open probability simplex 

A = [P = (pa) e A 4 : J2 Pij = l, Pij > 0, i,j = 0, l} . 

As our goal is to describe odds ratios for tables, we may assume the ratios to be 
non-zero positive numbers. 
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Fixed the first two ratios 

r x = a 2 and ni = /3 2 , 
we want to answer the following question: 

Ql: How can we describe the locus of tables having the assigned 

two ratios? 
and also 

Q2: What are the possible values of the third ratio? 

These questions were posed in the AIM computational algebraic statistics plenary 
lecture by Stephen Fienberg in 2003. In this situation, some interesting comments 
about treating questions Ql and Q2 were also made. 

Consider the quadratic hypersurfaces of A 4 : 

Q a ■ a 2 p ipw = P00P11 

and 

Q/3 ■ P 2 PoiPu = P00P10 ■ 
Notice that a matrix in Q a n Qp is such that r x = a 2 and rii = (3 2 as soon as 
the ratios are defined. Hence, to answer the first question, it is enough to study 

Q a n Qp \ Z , 

where Z = {P = (p y ) G A 4 : P00P01P10P11 = 0}. 

We readily see that Q a DQp contains the 2-dimensional skew linear spaces 

Poo = Poi = and p w = p n = 

and by general facts on quadrics (see |Har92l page 301]) we know that there exist 
two more 2-dimensional skew linear spaces, R and S, such that 

Q a n Q/3 = {poo = Poi = 0} U { PlQ = Pll = 0} U R U S . 
Manipulating equations we notice that a point in Q a n Qp \ Z is such that 

Poo _ a 2Pio _ gzPil 
Poi Pn Pw 

and 

Pio = p2Poi_ _ 1 Poo 
Pn Poo " 2 Poi ' 
Hence, R and S lie in the intersection of the two 3-dimensional spaces 

(otpio - (3pn)(ap w + 0p n ) = 

and ^ 

(PPoi Poo)((3poi + -Poo) = , 

a a 

where a and (3 are chosen to be positive. Only two out of the four resulting 2- 
dimensional linear spaces lie in both Q a and Qp and these are R and S: 

R : ap w - (3pn = Ppoi ~ ~Poo = , 

a 

S : ap 10 + (3pn = Ppoi + ~Poo = , 

a 

which have parametric presentations 

R = {(f3u, —u, j3v, av) :«,«£!}, 
a 
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S = {(/?«, u, [3v, —av) : u, v G 



a 



Summing all these facts up, we get 

Proposition 2.1. Fix the ratios r x = a 2 and r\\ — (3 2 . 

given ratios if and only if it has the form 



Then, a matrix has the 



fiu 
0v 



—u 

a 

av 



[3u 
pv 



-—u 

a 

-av 



with u, v non-zero real parameters. 



Finally, we have to intersect R and S with the probability simplex. As we can 
choose a and j3 to be positive, we immediately see that S D A — (there is always 
a non-positive coordinate). 

To determine R (~l A, notice that R n [J^Pij = 1} is obtained by taking 

_ 1 - ((3 + a)v 



in the parametric presentation of R. Hence, we get 



Proposition 2.2. Fix the ratios 



and 



/3 2 . Then, a table has the 



given ratios if and only if it has the form 

( t&x [1-0? + <*)«] -(/? + «)«] \ 

\ f3v av J 

where < v < —7-3. 

This answers question Ql: fixed the two ratios, the tables with those ratios 
describe a segment in the probability simplex. 

Remark 2.3. In [BFH75I Section 2.7], a parametric description of the tables with 
r x = 1 is written in the form 

st s(l - t) 

(l-s)t (l-«)(l-t) 

Let us check that our parametrization contains this as a special case. In order to 
do this, we will compute the marginal sums 



(4) 



(5) 



V 



0v 

/3+1 



1 

0+1 



l-(J3+l)v ^ 
(P + l)v 



J 



Hence, the parametrizations in Equations and (JSJ are just the same, simply let 
t = Tjfx and s = 1 - (/3 + l)v. 

Remark 2.4. Suppose to fix r x and to ask for a geometric description of the locus 
of tables with this ratio. Using Proposition 12 . 21 we can easily get an answer. For 
each value of ri we get a segment of tables, and making rn to vary this segment 
describes a portion of a ruled quadratic surface. Notice that, for r x = 1, this is the 
result contained in BFH75,, Section 2.7]. In particular, we recall that matrices such 
that r x is fixed form a so called Segre variety (i.e., in this smooth quadric 

surface in the projective three space). For more on this see, e.g., [2SS05 . 
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Answering question Q2 is just a computation, and we see that 

1 [!-(/? + a)v} 2 
7 ' = ~ a/3 + 1 v 

where r x — a 2 and rii = (3 2 . Notice that, fixed r x and ri , the third ratio can freely 
vary in (0, +00). 

Remark 2.5. We expressed r— as a function r—(a, (3, v), and standard computations 
show that this is an invertible function of v. In particular, we get 

1 



a + f3+ y/(af3 + l)r= 

Thus, given r x = a 2 , ri = /3 2 and r=, we have an explicit description of the unique 
table with these ratios (use Proposition 12. 2J) . 

Clearly, completely analogous results hold if we fix the ratios r x and r—. 

If we fix the ratios 7*11 = f3 2 ,r = — 7 2 and we argue as above, we get the following: 

Proposition 2.6. i/ie ratios ri = /3 2 and r= = 7 2 - Then, a table ftas i/ie given 
ratios if and only if it has the form 



g 

/3u 



, 4A [l-(/3 + 7>] 




where < v < -r-j— . 

P+7 



Again, a trivial computation yields: 

/? \ 2 [l-(/3 + 7 H : 



and hence, fixed r= and rn, the third ratio can freely vary in (0, +00), see Remark 

m 

Remark 2.7. In recent literature, there is an increasing attention to the geometrical 
structure of probability models for contingency tables. In particular, in [Sla04 
Chapter 3] the author presents some results about the geometrical characterization 
of probability models for 2x2 contingency tables in terms of the cross-product 
ratio and the conditional distributions. In the same work the connections between 
the odds ratios and the classical log-linear and ANOVA-type representations of the 
probability models are clearly stated. We remark that our notation slightly differs 
from the one used by A. Slavkovic in her Ph.D. dissertation. 

In the same direction, in LWJ04 the graphical visualization of joint, marginal 
and conditional distributions on the probability simplex for 2 x 2 contingency tables 
is presented. 

3. The 2x3 case 

The study of tables with more than two rows and columns would be of great 
interest, but the complexity of the problem readily increases as we show in the 2x3 
case. 

Consider the 2x3 matrix 

' Poo P01 P02 
P10 P11 P12 
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and define odds ratios as above for each 2x2 submatrix. We complete our previous 
notation by adding a superscript to denote the deleted column, e.g. 

r (l) = P00P02 _ 

P10P12 ' 

Again, we consider a matrix as a point in a real affine space, in this case A 6 . Notice 
that the ratios are well defined for matrices in A 6 \ Z, where Z denotes the set of 
matrices with at least a zero entry. 

Relations among the ratios are the cause of the increased complexity of the 
higher dimensional cases. For example, as we will see, two of the ratios can always 
be freely fixed. But, as soon as three ratios are considered, constraints come in the 
picture. 

Easy calculations show that the following relations hold: 

r (0)_(2) _ 

and also 

r M = r £ 2 )(rL°))- 1 , 
^ 2 ) =r «( r (o) r i. 

These relations, beside producing constraints on the numerical choice of the 
ratios, lead to a much more complex geometric situation. We illustrate this by 
exhibiting some explicit examples (worked out with the Computer Algebra systems 
Singular and C0C0A). As references for the software, sec CoC04 and GPS01 . 

More precisely, we fix some of the ratios and we describe the locus of matrices 
satisfying these relations in 

S°={P=(p !J )eA 6 :^ 1J =l}\Z, 

i.e. the space of matrices with non-null entries of sum one. For the sake of simplicity, 
we do not consider the positivity conditions defining the simplex. 

In our geometric descriptions, we will slightly abuse terminology, e.g. we will 
call a line in S° a line in A 6 not contained in Z; notice that our lines may have 
some holes (i.e. the points of intersection with Z). 

We start by considering the easiest case where two of the ratios are fixed. Already 
at this stage, a dichotomy arises and we have two different situations, as shown in 
the following examples: 

(6) r ^=r^=l, 



(7) r ( x 1} = ri 2 ' = 1 or r« = r\f = 1 or = r[ 2) = 1 or r« = r£ 2 ) = 1 . 

The locus of matrices in E° satisfying one of conditions J7J) is a 3-dimensional 
variety of degree 4, while condition JSJl describes a 3-dimensional variety of degree 
3. Roughly speaking, the degree (sec Har92 page 16] and Sha95 page 41]) is a 
measure of the complexity of the variety. For a surface in 3-space, for example, 
the degree bounds the number of intersections with a line and, in a certain sense, 
measures how the surface is folded. 
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Next, we try to fix three of the ratios, for example 

(8) r ( x 0) = r« = r\f = 1 or r ( x 0) = r$ 

( 9 ) J°) - J 1 ) - ^(2) _ 



(10) r ( x 0) =4,r ( x 1} =3,r£ 2 ) =2. 

The locus of matrices in S° satisfying one of conditions |JBJ is the union of two 
quadratic surfaces, while condition @ gives a plane. Moreover, if we consider 
the same ratios but we vary their values, as in i|10[l . the locus of matrices is now 
described by a single quadratic surface. 

Finally, a glimpse of the case of four fixed ratios: 

(11) r W =r (i) =r (D =r (a) = lj 



(12) r ( x ° } = r ( x 1} = rL J) = r£ 2) - 1 , 

In both cases, the locus is described by a curve as expected. But, condition 111|) 
produces the union of four lines, while condition (|12fl is satisfied by a single line in 
S°. 

The Computer Algebra systems Singular and CoCoA were used to compute 
primary decompositions (giving the irreducible components of the loci) and Hilbcrt 
functions (giving the dimension and the degree of the loci). 

4. An application. The case-control studies 

Two-by-two contingency tables are natural models for a large class of problems 
known, in medical literature, as case-control studies. Let us consider a table coming, 
e.g., from the study of a new pharmaceutical product, or clinical test, designed for 
the detection of a disease. This is an example of a case-control study. 

In a case-control study there are two random variables. The first variable X\ 
encodes the presence (level 1) or absence (level 0) of the disease. The second variable 
X2 encodes the result of the clinical test (level 1 if positive, level if negative) . 

The joint variable (Xi, X%) has 4 outcomes, namely: 

(0,0), (0,1), (1,0), (1,1). 

Its probabilities form a 2 x 2 contingency table: 

fpoo Pox\ 
\Pw PnJ 

The probabilities poo and p\\ are called the probability of true negative and of 
true positive, respectively. They correspond to the cases of correct answer of the 
clinical test. The probabilities pio and poi are called the probability of false positive 
and of false negative, respectively. They correspond to the two types of error which 
can show in a case-control study. For example, the probability of false negative is 
the probability that a diseased subject is incorrectly classified as not diseased. 

A perfect clinical test which correctly classifies all the subjects would have poi 
and pio as low as possible, implying a large value of the odds ratio r x . Therefore, 
the odds ratio r x measures the validity of the clinical test. In particular, when 
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r x = 1, the random variables are statistically independent. In our framework this 
means that, when r x = 1, the result of the clinical test is independent from the 
presence or absence of the disease. Unless one obtains a large value of r x , the 
clinical test is judged as non efficient. The odds ratio r x is also called Diagnostic 
Odds Ratio (DOR) in medical literature. 

In such a case-control study, two essential indices are the specificity and the 
sensitivity, defined as: 

.n ., POO 

specificity = 

Poo + Poi 

and 

... ., Pn 
sensitivity = . 

Pio +Pu 

Specificity is the proportion of true negative among the diseased subjects, while 
sensitivity is the proportion of true positive among the non-diseased subjects. 
Straightforward computations show that 

specificity/(I — specificity) 
x (1 — sensitivity) /sensitivity 

In view of the definition above, it is easy to show that the relative magnitude 
of the sensitivity and specificity is measured by the odds ratio n i . In fact one can 
show that 

sensitivity/ (1 — sensitivity) 1 
specificity/(I — specificity) rn 

The ratio above is called Error Odds Ratio (EOR). 

In recent literature, the DOR and the EOR are relevant parameters for the 
assessment of the validity of a clinical test. They have received increasing attention 
in the last few years and a huge amount of literature has been produced. Hence, 
we refrain from any tentative description and refer the interested reader to, for 
example, Kn oOIj . 

The meaning of the third ratio r— is not straightforward as explained in BFH75 
Page 21]. However its statistical meaning can be derived using Equations J5J and 
© shown in Section 

Finally, we remark that the geometrical structure of the statistical models for 
case-control studies is very simple. From the results in Section |3 one readily sees 
that the models are segments or portions of ruled quadratic surfaces. Moreover, 
from a Bayesian point of view, Propositions 12 . 21 and 12 .61 allow to compute the exact 
range of the free odds ratio. 

Acknowledgement. We wish to thank an anonymous referee for his/her valuable 
suggestions and comments for the improvement of the paper. 
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